An in-silico exploration of an evolutionarily convergent TGF-𝛽 mimetic from a parasitic nematode Heligmosomoides polygyrus bakeri

Author

Joe Boktor

Published

February 10, 2023

Background

Previous work has demonstrated that an intestinal helminth parasite (Heligmosomoides polygyrus) is able to expand host regulatory T-cell populations through activation of TGF-𝛽 receptors(Johnston et al. 2017). Further work went on to identify a protein, H. polygyrus TGF-β mimic (Hp-TGM), which potently binds host TGF-𝛽 receptors; however, this protein is reported to display little to no sequence homology to human TGF-𝛽.

Objectives

In this notebook, we will explore the similarities and differences between human TGF-𝛽 (hTGF-𝛽) and the H. polygyrus TGF-β mimic (Hp-TGM) using a multi-layered in-silico strategy. First, we will compare amino acid sequences between the two proteins searching for local, global, and motif homology. Second, we will employ alphafold multimer for complex structure prediction, and explore the structures of hTGF-𝛽 + hTGF-𝛽 receptor(s) in comparison to HP-TGM + hTGF-𝛽 receptor(s). We will then further improve and our complex predictions using EvoDock (an evolutionary algorithm application of RosettaDock) and determine binding residues and complex binding energies. These analyses will enable quantitative comparisons between sequence, structure, and docking based computational strategies and ultimately guide the development of a high-throughput in-silico screen for human immune system host-microbe-interactions (HMI).

Analysis

Setup

Code
library(tidyverse)
library(reticulate)
library(magrittr)
library(glue)
library(seqinr)
library(future)
library(batchtools)
library(future.batchtools)
library(fs)
library(tictoc)
library(listenv)
library(progress)
library(viridis)
library(bio3d)
library(r3dmol)
library(strex)
library(plotly)
library(ggsci)

tmpdir <- "/central/scratch/jbok/tmp"
homedir <- "/central/groups/MazmanianLab/joeB"
wkdir <- glue(
  "{homedir}/",
  "Microbiota-Immunomodulation/Microbiota-Immunomodulation"
)
src_dir <- glue("{wkdir}/notebooks")
source(glue("{src_dir}/R-scripts/helpers_general.R"))
source(glue("{src_dir}/R-scripts/helpers_sequence-screens.R"))
source(glue("{src_dir}/R-scripts/rhmmer-package.R"))
protein_catalogs <- glue("{homedir}/", "Downloads/protein_catalogs")
shell_do(glue("mkdir -p {wkdir}/data/interim/fastas/raw/monomers"))
shell_do(glue("mkdir -p {wkdir}/data/interim/fastas/processed/monomers"))
shell_do(glue("mkdir -p {wkdir}/data/interim/fastas/processed/complexes"))
Code
gget_seq <- function(ensembl_id, amino_acid = TRUE) {
  seq_results <- gget$seq(ensembl_id, translate = amino_acid)
  return(seq_results[2])
}
tgf_keys <- c("TGFB1", "TGFB2", "TGFB3", "TGFBR1", "TGFBR2", "TGFBR3")
tgfb_search <- gget$search(tgf_keys, "homo_sapiens")
tgfb_search %<>% filter(biotype == "protein_coding", gene_name %in% tgf_keys)
# collect amino acid sequences for each protein using the ensembl_id
tgfb_search %<>% mutate(sequences_aa =  map_chr(ensembl_id, gget_seq))

# Manually editing sequences - removing unwanted subdomains within pro-proteins
HP_TMG_SEQ <- "DDSGCMPFSDEAATYKYVAKGPKNIEIPAQIDNSGMYPDYTHVKRFCKGLHGEDTTGWFVGICLASQWYYYEGVQECDDRRCSPLPTNDTVSFEYLKATVNPGIIFNITVHPDASGKYPELTYIKRICKNFPTDSNVQGHIIGMCYNAEWQFSSTPTCPASGCPPLPDDGIVFYEYYGYAGDRHTVGPVVTKDSSGNYPSPTHARRRCRALSQEADPGEFVAICYKSGTTGESHWEYYKNIGKCPDPRCKPLEANESVHYEYFTMTNETDKKKGPPAKVGKSGKYPEHTCVKKVCSKWPYTCSTGGPIFGECIGATWNFTALMECINARGCSSDDLFDKLGFEKVIVRKGEGSDSYKDDFARFYATGSKVIAECGGKTVRLECSNGEWHEPGTKTVHRCTKDGIRTL"
TGFB1 <- "ALDTNYCFSSTEKNCCVRQLYIDFRKDLGWKWIHEPKGYHANFCLGPCPYIWSLDTQYSKVLALYNQHNPGASAAPCCVPQALEPLPIVYYVGRKPKVEQLSNMIVRSCKCS"
TGFB2 <- "ALDAAYCFRNVQDNCCLRPLYIDFKRDLGWKWIHEPKGYNANFCAGACPYLWSSDTQHSRVLSLYNTINPEASASPCCVSQDLEPLTILYYIGKTPKIEQLSNMIVKSCKCS"
TGFB3 <- "ALDTNYCFRNLEENCCVRPLYIDFRQDLGWKWVHEPKGYYANFCSGPCPYLRSADTTHSTVLGLYNTLNPEASASPCCVPQDLEPLTILYYVGRTPKVEQLSNMVVKSCKCS"

hp_mimic_data <-
  tribble(
    ~gene_name, ~sequences_aa,
    "HP-TGM",   HP_TMG_SEQ
  )
tgfb_search %<>% bind_rows(hp_mimic_data)

saveRDS(tgfb_search, glue(
  "{wkdir}/data/interim/tmp/",
  "{Sys.Date()}_TGFB_gget.rds"
))

Saving individual sequences in FASTA format.

Code
# Save individual fasta files for each protein -----
tgfb_search <- readRDS(glue(
  "{wkdir}/data/interim/tmp/",
  "2023-02-27_TGFB_gget.rds"
))

# Manually editing the fasta sequences to remove the signal peptide and other non-active domains
tgfb_search %<>% 
  mutate(sequences_aa = case_when(
    gene_name == "TGFB1" ~ TGFB1,
    gene_name == "TGFB2" ~ TGFB2,
    gene_name == "TGFB3" ~ TGFB3,
    TRUE ~ sequences_aa  
  ))

fastas_raw_dir <- glue("{wkdir}/data/interim/fastas/raw/monomers")
for (tf in tgfb_search$gene_name) {
  tf_aa <- tgfb_search %>%
    filter(gene_name == tf) %>%
    pull(sequences_aa)
  fasta_path <- glue("{fastas_raw_dir}/{tf}.fasta")
  write.fasta(
    sequences = tf_aa,
    names = tf,
    file.out = fasta_path,
    open = "w",
    nbchar = 60,
    as.string = FALSE
  )
}

Applying SignalP 6.0 to predict and remove signal peptides from fasta files.

Code
fastas_raw_paths <- paste0(
  fastas_raw_dir, "/", 
  list.files(fastas_raw_dir, pattern = "fasta")
)
names(fastas_raw_paths) <- fastas_raw_paths %>% 
  purrr::map(~fs::path_ext_remove(basename(.)))

fastas_processed_tgfb <- list()
for (id in names(fastas_raw_paths)) {
  fasta_path <- fastas_raw_paths[[id]]
  fastas_processed_tgfb[[id]] <- glue(
    "{wkdir}/data/interim/fastas/processed/monomers/{id}"
    )
  if (dir.exists(fastas_processed_tgfb[[id]])) {
    next
  } 
  message("Processing ", id, " fasta file...")
  slurm_shell_do(
    glue(
      "conda run -n signalp6",
      " signalp6",
      " --organism eukarya",
      " --fastafile {fasta_path}",
      " --output_dir {fastas_processed_tgfb[[id]]}",
      " --write_procs 8",
      " --format all",
      " --mode fast"
      ),
      ncpus = 8
  )
}

# IF no signal peptide is found, the fasta file is copied to the processed dir
for (id in names(fastas_raw_paths)) {
  raw_fasta <- fastas_raw_paths[[id]]
  processed_fasta <- glue(
    "{fastas_processed_tgfb[[id]]}/",
    "processed_entries.fasta"
    )
  if (file.info(processed_fasta)$size == 0){
    message("Overwriting ", id, " processed fasta file...")
    shell_do(glue("cp {raw_fasta} {processed_fasta}"))
  }
}

Saving FASTA files for complex predictions

Code
# Saving protein complexes (post-processing) ----
tgfb_search <- readRDS(glue(
  "{wkdir}/data/interim/tmp/",
  "2023-02-27_TGFB_gget.rds"
))
# Split into TFs and Receptors and combine all possible pairs
tgf_receptors <- tgfb_search %>% 
  filter(grepl("receptor", ensembl_description)) %>% 
  pull(gene_name)
tgf_tf <- tgfb_search %>% 
  filter(!grepl("receptor", ensembl_description)) %>% 
  pull(gene_name)

format_seq <- function(fasta) {
  fasta %>% 
    seqinr::getSequence() %>% 
    unlist() %>%
    paste(collapse = "")
}

# Saving paired fasta files
fastas_merged_path <- glue("{wkdir}/data/interim/fastas/processed/complexes")
for (tf in tgf_tf) {
  tf_aa <- seqinr::read.fasta(
      glue("{fastas_processed_tgfb[[tf]]}/processed_entries.fasta"),
      seqtype = "AA"
      )
  for (receptor in tgf_receptors) {
    rec_aa <- seqinr::read.fasta(
      glue("{fastas_processed_tgfb[[receptor]]}/processed_entries.fasta"),
      seqtype = "AA"
      )
    merged_name <- paste(
      seqinr::getName(tf_aa), 
      seqinr::getName(rec_aa), 
      sep = "_"
      )
    merged_fasta_path <- glue("{fastas_merged_path}/{merged_name}.fasta")
    if (file.exists(merged_fasta_path)) {
      next
    }
    message("Saving: ", merged_name)
    write.fasta(
      sequences = list(format_seq(tf_aa), format_seq(rec_aa)),
      names =list(seqinr::getName(tf_aa), seqinr::getName(rec_aa)),
      file.out = merged_fasta_path,
      open = "w",
      nbchar = 60,
      as.string = FALSE
    )
  }
}

# Saving trioed fasta files
rec1 <- seqinr::read.fasta(
  glue("{fastas_processed_tgfb[['TGFBR1']]}/processed_entries.fasta"),
  seqtype = "AA"
  )
rec2 <- seqinr::read.fasta(
  glue("{fastas_processed_tgfb[['TGFBR2']]}/processed_entries.fasta"),
  seqtype = "AA"
  )
for (tf in tgf_tf) {
  tf_aa <- seqinr::read.fasta(
      glue("{fastas_processed_tgfb[[tf]]}/processed_entries.fasta"),
      seqtype = "AA"
      )
  merged_name <- paste(seqinr::getName(tf_aa), "TGFBR1_TGFBR2", sep = "_")
  merged_fasta_path <- glue("{fastas_merged_path}/{merged_name}.fasta")
  if (file.exists(merged_fasta_path)) {
    next
  }
  message("Saving: ", merged_name)
  write.fasta(
    sequences = list(
      format_seq(tf_aa), 
      format_seq(rec1),
      format_seq(rec2)
      ),
    names =list(
      seqinr::getName(tf_aa),
      seqinr::getName(rec1), 
      seqinr::getName(rec2)
      ),
    file.out = merged_fasta_path,
    open = "w",
    nbchar = 60,
    as.string = FALSE
  )
}

Sequence Similarity (Local & Global)

Creating a dataframe of job metadata for slurm batchtools.

Code
sequence_analysis_dir <- glue(
  "{wkdir}/data/interim/",
  "tgfb_analyses/sequence-alignment"
)
tgfb_ligand_dir <- glue("{wkdir}/data/interim/alphafold-multimer/TGFB_ligands")
shell_do(glue("mkdir -p {sequence_analysis_dir}"))
shell_do(glue("mkdir -p {tgfb_ligand_dir}"))

seqinr::write.fasta(
  sequences = as.list(tgf_tf$sequences_aa),
  names = as.list(tgf_tf$gene_name),
  file.out = glue("{tgfb_ligand_dir}/tgfb_ligands.fasta"),
  open = "w",
  nbchar = 60,
  as.string = FALSE
)

alignment_df_list <- bind_rows(
  tgf_tf %>%
    mutate(ensembl_id = gene_name) %>%
    dplyr::select(ensembl_id, sequences_aa) %>%
    mutate(method = "needle"),
  tgf_tf %>%
    mutate(ensembl_id = gene_name) %>%
    dplyr::select(ensembl_id, sequences_aa) %>%
    mutate(method = "water")
) %>%
  mutate(
    refdb_path = glue("{tgfb_ligand_dir}/tgfb_ligands.fasta"),
    output_dir = sequence_analysis_dir
  )

alignment_df_list %>% glimpse()

Executing slurm sequence alignment jobs.

Code
# configure registry ----
cluster_run <- glue("{Sys.Date()}_TGFB-EMBOSS-Alignment_ID-{rand_string()}/")
message("\n\nRUNNING:  ", cluster_run, "\n")
breg <- makeRegistry(
  file.dir = glue(
    "{wkdir}/.cluster_runs/",
    cluster_run
  ),
  seed = 42
)
breg$cluster.functions <- batchtools::makeClusterFunctionsSlurm(
  template = glue("{wkdir}/batchtools_templates/batchtools.slurm.tmpl"),
  scheduler.latency = 0.5,
  fs.latency = 65
)
# Submit Jobs ----
jobs <- batchMap(
  fun = align_fasta_sequences,
  args = alignment_df_list,
  reg = breg
)
setJobNames(jobs,
  paste("TGFB-EMBOSS",
  alignment_df_list$ensembl_id,
  alignment_df_list$method,
  sep = "_"),
  reg = breg
)
getJobNames(jobs, reg = breg)
submitJobs(jobs,
  resources = list(
    walltime = 3600,
    memory = 1024,
    ncpus = 1,
    max.concurrent.jobs = 9999
  )
)

Aggregating alignment files.

Code
file_paths <- list.files(sequence_analysis_dir, pattern = "rds$", full.names = TRUE)
needle_files <- file_paths %>%
  keep(grepl("HP-TGM_tgfb_ligands.needle.rds", .))
water_files <- file_paths %>%
  keep(grepl("HP-TGM_tgfb_ligands.water.rds", .))

tgfb_alignments <-  
  bind_rows(
    file_paths %>% 
      keep(grepl("needle", .)) %>% 
      load_rds_files() %>% 
      mutate(method = "Needleman-Wunsch (global) "),
    file_paths %>% 
      keep(grepl("water", .)) %>% 
      load_rds_files() %>% 
      mutate(method = "Smith-Waterman (local)")
  )

saveRDS(tgfb_alignments,
  glue("{wkdir}/data/interim/tgfb_analyses/sequence-alignment-df.rds")
  )
Code
tgfb_alignments <- 
  readRDS(
    glue("{wkdir}/data/interim/tgfb_analyses/sequence-alignment-df.rds")
  )

# Plotting sequence alignment data
alignment_heatmap <- tgfb_alignments %>% 
  ggplot(aes(x = protein_1, y = protein_2, fill = Score)) +
  geom_tile() +
  geom_text(aes(label = paste0(percent_identity, "%\n", percent_similarity, "%")), 
    color = "white", size = 3) +
  theme_bw() +
  facet_wrap( ~ method) +
  scale_x_discrete(expand=c(0,0)) +
  scale_y_discrete(expand=c(0,0)) +
  scale_fill_viridis_c(option = "G") +
  labs(x = "", y = "", fill = "Alignment\nScore") +
  theme(axis.text.x = element_text(angle=45, hjust=1) )

ggsave(alignment_heatmap, 
  filename = glue("{wkdir}/figures/tgfb-analysis/sequence-alignment-heatmap.png"),
  dpi = 600, width = 7, height = 3
)

Figure 1: Local and global pairwise sequence alignment.

This figure depicts pairwise sequence alignment scores between all human TGF-β isomers I-III and the H. polygyrus TGF-β mimic (Hp-TGM). Color of the heatmap represents the alignment score and the top and bottom values within each cell indicate the percent identity and similarity of a sequence pair, respectively. These results indicate that human TGF-β isomers share a moderate degree of sequence similarity within isomer comparisons, but very little similarity with HP-TGM. This is indicated by a decrease of approximately two-fold in the percentage similarity observed in inter-species comparisons.

Hidden Markov Model Similarity

Creating MSA profiles for each protein using jackhmmer against UniProt.

Code
tgfb_fasta_dir <- glue("{wkdir}/data/interim/alphafold-multimer/TGFB_fasta")
tgfb_fasta_paths <- paste0(
  tgfb_fasta_dir, "/", 
  list.files(tgfb_fasta_dir, pattern = "fasta") %>% keep(!grepl("_", .))
)

for (fasta_path in tgfb_fasta_paths) {
  id <- fs::path_ext_remove(basename(fasta_path))
  output_dir <- glue(
    "{wkdir}/data/processed/",
    "jackhmmer_results/{id}"
  )
  shell_do(glue("mkdir -p {output_dir}"))
  if (!file.exists(glue("{output_dir}/{id}_msa.fasta"))) {
    slurm_shell_do(
      cmd = glue(
        "jackhmmer ",
        " --cpu 8",
        " -N 5",
        " -E 0.05",
        " -A {output_dir}/{id}_msa.fasta",
        " -o {output_dir}/{id}_jackhmmer.out",
        " --tblout {output_dir}/{id}_seqhits.txt",
        " --domtblout {output_dir}/{id}_domainhits.txt",
        " --noali",
        " {fasta_path}",
        " /central/software/alphafold/data/uniprot/uniprot.fasta"
      ),
      jobname = glue("jkhmr_{fasta_path}_{rand_string()}"),
      working_dir = wkdir,
      ncpus = 8,
      memory = "4G",
      walltime = 36000
    )
  }
}

Creating HMM profiles from MSAs and estimating sequence similiarity using Hmmsearch on custom protein catalog.

Code
protein_catalog_path <- glue(
  "{homedir}/Downloads/protein_catalogs/clustered_catalogs",
  "/merged/2023-02-13_catalog_MMSeq2-95_rep_seq.fasta"
)
ids <- tgfb_fasta_paths %>%
  purrr::map(~ fs::path_ext_remove(basename(.)))

for (id in ids){
  input_dir <- glue(
    "{wkdir}/data/processed/",
    "jackhmmer_results/{id}"
  )
  output_dir <- glue(
    "{wkdir}/data/processed/",
    "hmmersearch_results/{id}"
  )
  shell_do(glue("mkdir -p {output_dir}"))
  # make hmm profile from MSA
  fasta_path <- glue("{input_dir}/{id}_msa.fasta")
  shell_do(glue("hmmbuild {output_dir}/{id}.hmm {fasta_path}"))
  # hmmsearch on protein catalog using hmm profile
  slurm_shell_do(
    cmd = glue(
      "hmmsearch ",
      " --cpu 8",
      " -A {output_dir}/{id}_msa.hmm",
      " -o {output_dir}/{id}_hmmsearch.out",
      " --tblout {output_dir}/{id}_tblout.txt",
      " --domtblout {output_dir}/{id}_domtblout.txt",
      " --noali",
      " {output_dir}/{id}.hmm",
      " {protein_catalog_path}"
    ),
    jobname = glue("hmrsearch_{fasta_path}_{rand_string()}"),
    working_dir = wkdir,
    ncpus = 8,
    memory = "4G",
    walltime = 36000
  )
}

Visualizing pHMM results.

Code
catalogs <- c(
  "ELGG",
  "Hadza",
  "KIJ",
  "MGV",
  "RUGS",
  "RUMMETA",
  "UHGP",
  "VEuPathDB",
  "Wormbase"
)

catalogs_cols <- pal_npg()(length(catalogs))
names(catalogs_cols) <- catalogs

hmmer_results_dir <- glue("{wkdir}/data/interim/tgfb_analyses/hmmersearch_results/")
hmmer_results_paths <- list.dirs(
  glue("{wkdir}/data/interim/tgfb_analyses/hmmersearch_results/"), 
  recursive = FALSE)

hmmer_hit_plots <- list()
for (id in basename(hmmer_results_paths)) {
  sequence_hits <- read_tblout(
  glue("{hmmer_results_dir}/{id}/{id}_tblout.txt")
  )
  hmmer_hit_plots[[id]] <- sequence_hits %>% 
    mutate(catalog = map_chr(
        str_before_nth(domain_name, "_", 2), 
        ~ str_remove(., "CATID_"))
        ) %>%
      ggplot(aes(
        x = -log10(sequence_evalue), 
        y = -log10(best_domain_evalue),
        group = description
        )) +
      scale_color_manual(values = catalogs_cols) +
      geom_point(aes(color = catalog)) +
      theme_bw()
}
Code
ggplotly(hmmer_hit_plots$`HP-TGM` )

Figure 2: HP-TGM profile Hidden Markov Model hits (E value <= 0.05) against our custom microbial protein catalog. The x and y axes display E-value signifiance for the overall sequence and the best domain, respectively. Points are colored by the subcatalog of origin.

Code
ggplotly(hmmer_hit_plots$TGFB1)

Figure 3: TGF-β1 profile Hidden Markov Model hits (E value <= 0.05) against our custom microbial protein catalog. The x and y axes display E-value signifiance for the overall sequence and the best domain, respectively. Points are colored by the subcatalog of origin.

Code
ggplotly(hmmer_hit_plots$TGFB2)

Figure 4: TGF-β2 profile Hidden Markov Model hits (E value <= 0.05) against our custom microbial protein catalog. The x and y axes display E-value signifiance for the overall sequence and the best domain, respectively. Points are colored by the subcatalog of origin.

Code
ggplotly(hmmer_hit_plots$TGFB3)

Figure 5: TGF-β3 profile Hidden Markov Model hits (E value <= 0.05) against our custom microbial protein catalog. The x and y axes display E-value signifiance for the overall sequence and the best domain, respectively. Points are colored by the subcatalog of origin.

Key Conclusions:

  • HP-TGM shares little to no sequence similarity with human human TGF-β isomers as indicated by local and global alignment.
  • When analyzing sequence motifs, HP-TGM displays similarity to Complement Control Protein (CCP, or Sushi) family domains, suggesting an entirely alternative origin than human TGF-β.

Structural Analysis and Complex Predictions

Executing alpha-fold multimer models

Code
cluster_reports <- glue(
  "{wkdir}/.cluster_runs/",
  "{Sys.Date()}_AlphaFold-Multimer_TGFB-trimmedseqs_{rand_string()}"
)
message("\n\n CREATING:  ", cluster_reports, "\n")
shell_do(glue("mkdir -p {cluster_reports}"))

complex_fastas <-
  list.files(
    glue("{wkdir}/data/interim/fastas/processed/complexes"),
    # glue("{wkdir}/data/interim/alphafold-multimer/TGFB_fasta"),
    full.names = TRUE
  )

for (complex_path in complex_fastas) {
  jname <- fs::path_ext_remove(basename(complex_path))
  slurm_run_alphafold(
    jobname = jname,
    slurm_out = cluster_reports,
    input_fasta = complex_path,
    mem = "128G",
    gpus = 1,
    cpus_per_task = 8,
    output_dir = glue("{wkdir}/data/interim/alphafold-multimer/2023-02-28_TGFB_complexes/{jname}")
  )
}
Code
receptor_ligand_pairs <- c(
  "TGFB1_TGFBR1",
  "TGFB2_TGFBR2",
  "TGFB3_TGFBR1_TGFBR2",
  "TGFB2_TGFBR1",
  "TGFB2_TGFBR2",
  "TGFB2_TGFBR1_TGFBR2",
  "TGFB3_TGFBR1",
  "TGFB3_TGFBR2",
  "TGFB3_TGFBR1_TGFBR2",
  "HP-TGM_TGFBR1",
  "HP-TGM_TGFBR2",
  "HP-TGM_TGFBR1_TGFBR2"
)

pdb_models <- list()
for (pair in receptor_ligand_pairs) {
  pdb_data <- bio3d::read.pdb(
    glue(
      "{wkdir}/data/interim/alphafold-multimer/",
      "2023-02-28_TGFB_complexes/{pair}/{pair}",
      "/ranked_0.pdb"
    ),
    multi = TRUE
  )
  
  pdb_models[[pair]] <- r3dmol(
    viewer_spec = m_viewer_spec(
      cartoonQuality = 2,
      lowerZoomLimit = 50,
      upperZoomLimit = 2000
    )
  ) %>%
    m_add_model(data = m_bio3d(pdb_data)) %>%
    m_add_surface(style = m_style_surface(opacity = 0.4)) %>%
    m_set_style(style = m_style_cartoon()) %>%
    m_set_style(sel = m_sel(chain = "B"),
                style = m_style_cartoon(color = "#FFD105")) %>%
    m_set_style(sel = m_sel(chain = "C"),
                style = m_style_cartoon(color = "#FC027E")) %>% 
    m_zoom_to() %>% 
    m_rotate(angle = 90, axis = "y") %>%
    m_spin()
  
}

saveRDS(pdb_models,
     glue("{wkdir}/data/interim/tgfb_analyses/{Sys.Date()}_r3dmol-complexes.rds")
     )
Code
pdb_models <- readRDS(
     glue("{wkdir}/data/interim/tgfb_analyses/2023-03-02_r3dmol-complexes.rds")
     )
Code
pdb_models$TGFB3_TGFBR1
Code
pdb_models$TGFB3_TGFBR2
Code
pdb_models$TGFB3_TGFBR1_TGFBR2
Code
pdb_models$`HP-TGM_TGFBR1`
Code
pdb_models$`HP-TGM_TGFBR2`
Code
pdb_models$`HP-TGM_TGFBR1_TGFBR2`

Complex Binding Energies

In-progress

Binding Interface Analysis

In-progress

Methods

Signal Peptide Removal SignalP 6.0 (fast) was used to predict and trim signal peptides. This prediction prediction model is based on a Bert protein language model encoder and a conditional random field (CRF) decoder.

Sequence Alignment Needleman-Wunsch and Smith-Waterman algorithms were applied using emboss v6.6.0.

profile Hidden Markov Model Analysis profile Hidden Markov Models were created using jackhmmer from HMMER v.3.3, with a maximum of 5 iterations and an E-value threshold of 0.05 against the latest UniProt catalog (as of 2023-03-02). Hidden models were assembled via hmmbuild and hmmsearch was applied against our custom microbial catalog database.

Protein Complex Structure Prediction Alphafold v2.2.0 was applied using the “multimer” setting.

R-Packages

Code
sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /central/home/jboktor/miniconda3/envs/pdmbsR/lib/libopenblasp-r0.3.21.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ggsci_2.9                plotly_4.10.1            strex_1.4.4             
 [4] r3dmol_0.2.0             bio3d_2.4-4              viridis_0.6.2           
 [7] viridisLite_0.4.0        progress_1.2.2           listenv_0.8.0           
[10] tictoc_1.1               fs_1.5.2                 future.batchtools_0.10.0
[13] batchtools_0.9.16        future_1.26.1            seqinr_4.2-23           
[16] glue_1.6.2               magrittr_2.0.3           reticulate_1.26         
[19] forcats_0.5.1            stringr_1.4.1            dplyr_1.0.9             
[22] purrr_0.3.5              readr_2.1.2              tidyr_1.2.0             
[25] tibble_3.1.7             ggplot2_3.4.0            tidyverse_1.3.1         

loaded via a namespace (and not attached):
 [1] bit64_4.0.5       lubridate_1.8.0   httr_1.4.3        tools_4.2.0      
 [5] backports_1.4.1   utf8_1.2.2        R6_2.5.1          lazyeval_0.2.2   
 [9] DBI_1.1.2         colorspace_2.0-3  ade4_1.7-19       withr_2.5.0      
[13] tidyselect_1.1.2  gridExtra_2.3     prettyunits_1.1.1 bit_4.0.4        
[17] compiler_4.2.0    textshaping_0.3.6 cli_3.4.1         rvest_1.0.2      
[21] xml2_1.3.3        labeling_0.4.2    scales_1.2.0      checkmate_2.1.0  
[25] rappdirs_0.3.3    systemfonts_1.0.4 digest_0.6.30     rmarkdown_2.18   
[29] pkgconfig_2.0.3   htmltools_0.5.3   parallelly_1.31.1 dbplyr_2.1.1     
[33] fastmap_1.1.0     htmlwidgets_1.5.4 rlang_1.0.6       readxl_1.4.0     
[37] rstudioapi_0.13   farver_2.1.0      generics_0.1.2    jsonlite_1.8.3   
[41] crosstalk_1.2.0   vroom_1.5.7       Matrix_1.5-3      Rcpp_1.0.8.3     
[45] munsell_0.5.0     fansi_1.0.3       lifecycle_1.0.3   stringi_1.7.8    
[49] yaml_2.3.6        MASS_7.3-57       grid_4.2.0        parallel_4.2.0   
[53] crayon_1.5.2      lattice_0.20-45   haven_2.5.0       hms_1.1.1        
[57] knitr_1.40        pillar_1.8.1      base64url_1.4     codetools_0.2-18 
[61] reprex_2.0.1      evaluate_0.18     data.table_1.14.2 modelr_0.1.8     
[65] png_0.1-7         vctrs_0.5.0       tzdb_0.3.0        cellranger_1.1.0 
[69] gtable_0.3.0      assertthat_0.2.1  xfun_0.34         broom_0.8.0      
[73] ragg_1.2.2        globals_0.15.0    ellipsis_0.3.2    brew_1.0-8       

References

Johnston, Chris J. C., Danielle J. Smyth, Ravindra B. Kodali, Madeleine P. J. White, Yvonne Harcus, Kara J. Filbey, James P. Hewitson, et al. 2017. “A Structurally Distinct TGF-β Mimic from an Intestinal Helminth Parasite Potently Induces Regulatory t Cells.” Nature Communications 8 (November): 1741. https://doi.org/10.1038/s41467-017-01886-6.